17. Measuring Performance

Measuring Performance

Before we move forward, let's take a quick look at the lab you will be working on at the end of this lesson. You can check out the lab here.

As you look through the code, you will come across a cell where we are splitting our dataset into separate, smaller datasets.

# Get randomized datasets for training and validation
train_features, valid_features, train_labels, valid_labels = train_test_split(
    train_features,
    train_labels,
    test_size=0.05,
    random_state=832289)

In the next video and the upcoming sections, we will discuss why this split is relevant and its importance in our model.

20 L Measuring Performance

Overfitting is a common problem when it comes to machine or deep learning and is an important concept to understand and avoid when developing a model which can generalize well to new (test) datasets. Using a validation set is a recommended way to avoid the model from learning from the test set. In the upcoming lessons, you will learn to tackle overfitting with more specific techniques.

Let's see how big your validation and test set need to be in the next section.